78 research outputs found

    A comparative analysis of 21 literature search engines

    Get PDF
    With increasing number of bibliographic software, scientists and health professionals either make a subjective choice of tool(s) that could suit their needs or face a challenge of analyzing multiple features of a plethora of search programs. There is an urgent need for a thorough comparative analysis of the available bio-literature scanning tools, from the user’s perspective. We report results of the first time semi-quantitative comparison of 21 programs, which can search published (partial or full text) documents in life science areas. The observations can assist life science researchers and medical professionals to make an informed selection among the programs, depending on their search objectives. 
Some of the important findings are: 
1. Most of the hits obtained from Scopus, ReleMed, EBImed, CiteXplore, and HighWire Press were usually relevant (i.e. these tools show a better precision than other tools). 
2. But a very high number of relevant citations were retrieved by HighWire Press, Google Scholar, CiteXplore and Pubmed Central (they had better recall). 
3. HWP and CiteXplore seemed to have a good balance of precision and recall efficiencies. 
4. PubMed Central, PubMed and Scopus provided the most useful query systems. 
5. GoPubMed, BioAsk, EBIMed, ClusterMed could be more useful among the tools that can automatically process the retrieved citations for further scanning of bio-entities such as proteins, diseases, tissues, molecular interactions, etc. 
The authors suggest the use of PubMed, Scopus, Google Scholar and HighWire Press - for better coverage, and GoPubMed - to view the hits categorized based on the MeSH and gene ontology terms. The article is relavant to all life science subjects.
&#xa

    A novel tissue-specific meta-analysis approach for gene expression predictions, initiated with a mammalian gene expression testis database

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the recent years, there has been a rise in gene expression profiling reports. Unfortunately, it has not been possible to make maximum use of available gene expression data. Many databases and programs can be used to derive the possible expression patterns of mammalian genes, based on existing data. However, these available resources have limitations. For example, it is not possible to obtain a list of genes that are expressed in certain conditions. To overcome such limitations, we have taken up a new strategy to predict gene expression patterns using available information, for one tissue at a time.</p> <p>Results</p> <p>The first step of this approach involved manual collection of maximum data derived from large-scale (genome-wide) gene expression studies, pertaining to mammalian testis. These data have been compiled into a Mammalian Gene Expression Testis-database (MGEx-Tdb). This process resulted in a richer collection of gene expression data compared to other databases/resources, for multiple testicular conditions. The gene-lists collected this way in turn were exploited to derive a 'consensus' expression status for each gene, across studies. The expression information obtained from the newly developed database mostly agreed with results from multiple small-scale studies on selected genes. A comparative analysis showed that MGEx-Tdb can retrieve the gene expression information more efficiently than other commonly used databases. It has the ability to provide a clear expression status (transcribed or dormant) for most genes, in the testis tissue, under several specific physiological/experimental conditions and/or cell-types.</p> <p>Conclusions</p> <p>Manual compilation of gene expression data, which can be a painstaking process, followed by a consensus expression status determination for specific locations and conditions, can be a reliable way of making use of the existing data to predict gene expression patterns. MGEx-Tdb provides expression information for 14 different combinations of specific locations and conditions in humans (25,158 genes), 79 in mice (22,919 genes) and 23 in rats (14,108 genes). It is also the first system that can predict expression of genes with a 'reliability-score', which is calculated based on the extent of agreements and contradictions across gene-sets/studies. This new platform is publicly available at the following web address: <url>http://resource.ibab.ac.in/MGEx-Tdb/</url></p

    MGEx-Udb: A Mammalian Uterus Database for Expression-Based Cataloguing of Genes across Conditions, Including Endometriosis and Cervical Cancer

    Get PDF
    Gene expression profiling of uterus tissue has been performed in various contexts, but a significant amount of the data remains underutilized as it is not covered by the existing general resources.). The database can be queried with gene names/IDs, sub-tissue locations, as well as various conditions such as the cervical cancer, endometrial cycles and disorders, and experimental treatments. Accordingly, the output would be a) transcribed and dormant genes listed for the queried condition/location, or b) expression profile of the gene of interest in various uterine conditions. The results also include the reliability score for the expression status of each gene. MGEx-Udb also provides information related to Gene Ontology annotations, protein-protein interactions, transcripts, promoters, and expression status by other sequencing techniques, and facilitates various other types of analysis of the individual genes or co-expressed gene clusters.In brief, MGEx-Udb enables easy cataloguing of co-expressed genes and also facilitates bio-marker discovery for various uterine conditions

    The Biotech-Bioinfo Interface in the Context of Education and Growth of the Biotechnology Industry in India Today

    No full text
    Abstract: Biotechnology and bioinformatics have a lot in common. Apart from the fact that they have been incorrectly equated to the ‘IT ’ in career aspects, the knowledge in either of these biological subjects can enhance the learning or contributions in the other. A careful attention to the common aspects of biotechnology and bioinformatics (the biotech-bioinfo interface) could improve the value of curricula in both the areas. This article identifies the curricular components which would be important not only for biotechnology and bioinformatics courses but also to other life science programs. A few examples of personal research projects have been discussed to illustrate the need to include/strengthen the biotech-bioinfo interface in our educational programs. Standards in life science education have to be improved and we can start by evaluating one aspect at a time. A decade back, several colleges taught mainly botany and zoology as life science subjects without caring much for the requirements of biologists in the country. Even though the choice of subjects has broadened now, things do not seem to have significantly changed in terms of student employability. One of the reasons may be that we, the people in education system (see Fig. 1), haven’t learned to keep pace with the changing times and demands. In fact, with less-informed students and parents, the situation seems to have worsened recently. Many colleges have opened post-graduate, and even graduate, programs in new subjects without proper teaching capacities. The lacunae include teachers ’ level of expertise and/or the infrastructure. This is not good for the growing Indian biotech-industry, which happens to be very diverse. The success of this sector depends a lot on the availability of top quality human resources. Hence, there is an urgent need to focus on specific aspects of educational programs and start improvising each of them. Certain aspects of life science curricula that involve parts of the following subjects will be discussed here

    BIOINFORMATICS APPLICATIONS NOTE Genome analysis ExPrimer: to design primers from exon–exon junctions

    No full text
    Summary: ExPrimer is a web-based computer program to design primers mainly from a specified exon–exon junction (E-E-jn) of a gene of interest. The tool suggests the optimum primer-pair(s) of which the right (reverse) primer represents a particular E-E-jn of the mRNA. The ‘product length ’ decides the location of the left primer. The results also include all other primer pairs considered and their ‘scores’. ExPrimer can use the NCBI BLASTn program for sequence specificity of primers. The tool is useful in many areas of molecular biology research that involve hybridization of short sequences with mRNA or cDNA. Availability

    How do you choose your literature search tool(s)?

    No full text
    With increasing number of bio-literature search engines, scientists and health professionals either make a subjective choice of tool(s) or face a challenge of analyzing multiple features of a plethora of bibliographic software. There is an urgent need for a thorough comparative analysis of the available literature scanning tools, from the user&#x2019;s perspective. We report results of the first time semi-quantitative comparison of 21 search programs, which can search published (partial or full text) documents in life science areas. The observations can assist life science researchers and medical professionals to make an informed selection among the programs, depending on their search objectives. &#xd;&#xa;Some of the important findings are: &#xd;&#xa;1. Most of the hits obtained from Scopus, ReleMed, EBImed, CiteXplore, and HighWire Press were usually relevant (i.e., these tools showed a better precision than other tools). &#xd;&#xa;2. But a very high number of relevant citations were retrieved by HighWire Press, Google Scholar, CiteXplore and Pubmed Central (they had better recall). &#xd;&#xa;3. HWP and CiteXplore seemed to have a good balance of precision and recall efficiencies. &#xd;&#xa;4. PubMed Central, PubMed and Scopus provided the most useful query systems. &#xd;&#xa;5. GoPubMed, BioAsk, EBIMed, ClusterMed could be more useful among the tools that can automatically process the retrieved citations for further scanning of bio-entities such as proteins, diseases, tissues, molecular interactions etc). &#xd;&#xa;The authors suggest the use of PubMed, Scopus, Google Scholar and HighWire Press - for better coverage, and GoPubMed - to view the hits categorized based on the MeSH and gene ontology terms

    GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species.

    No full text
    Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses.We developed the 'Genomic Repeat Element Analyzer for Mammals' (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species.GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/

    Effective Feature Selection for Classification of Promoter Sequences

    No full text
    <div><p>Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.</p></div

    Snapshots of GREAM illustrating use of ‘analyze orthologous gene-set’ on human transcription factor genes and their mouse orthologs from Steinhoff et al [48].

    No full text
    <p>Snapshots of GREAM illustrating use of ‘analyze orthologous gene-set’ on human transcription factor genes and their mouse orthologs from Steinhoff et al [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0133647#pone.0133647.ref048" target="_blank">48</a>].</p
    • …
    corecore